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Abstract — The successful operation of unmanned air vehicles 
requires software with a high degree of autonomy. Only if 
high level functions can be carried out without human control 
and intervention, complex missions in a changing and poten- 
tially unknown environment can be carried out successfully. 

Autonomy software is highly mission and safety critical: fail- 
ures, caused by flaws in the software cannot only jeopardize 
the mission, but could also endanger human life (e.g., a crash 
of an UAV in a densely populated area). Due to its large size, 
high complexity, and use of specialized algorithms (plan- 
ner, constraint-solver, etc.), autonomy software poses specific 
challenges for its verification, validation, and certification. 

We have carried out a survey among researchers and scien- 
tists at NASA to study these issues. In this paper, we will 
present major results of this study, discussing the broad spec- 
trum of notions and characteristics of autonomy software and 
its challenges for design and development. A main focus of 
this survey was to evaluate verification and validation (V&V) 
issues and challenges, compared to the development of “tra- 
ditional” safety-critical software. We will discuss important 
issues in V&V of autonomous software and advanced V&V 
tools which can help to mitigate software risks. Results of this 
survey will help to identify and understand safety concerns in 
autonomy software and will lead to improved strategies for 
mitigation of these risks. 
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1. Introduction 

The successful operation of unmanned air vehicles (UAV) re- 
quires autonomous functions on various levels. Even in the 


case, where a UAV is controlled remotely via a human pilot, 
autonomous systems must make sure that the UAV remains 
safe and controllable in case of a disruption of the command 
and control radio link. More advanced missions, like un- 
manned surveillance operations need a higher level of auton- 
omy, because the UAVs are supposed to operate for a longer 
time without without human control and intervention. Com- 
plex mission even require that the UAV can successfully cope 
with changing and unknown environments and carries out it 
operation under changing operation profiles without human 
control. 

In modem systems, autonomous operation is realized in soft- 
ware. Autonomy software is typically highly mission and 
safety critical: failures, caused by flaws in the software can- 
not only jeopardize the mission, but could also endanger hu- 
man life. For example, a damaged UAV must — without hu- 
man help and control — be able to avoid densely populated 
area for an emergency landing or crash. 

Since autonomous operation has become important and desir- 
able in a multitude of areas, like robotics, space missions, un- 
derwater exploration, and such, many approaches toward this 
topic can be found. However, the is no comprehensive and 
accepted notion of the risks of autonomy (or even, what au- 
tonomy actually is), and mature technology to provide good 
guarantees for the safe and reliable operation is not yet avail- 
able. 

In order to address these issues, the authors carried out a sur- 
vey on the topic of autonomy software asking NASA experts 
in autonomous systems and experts in software engineering 
and V&V. I questionnaire with 24 questions, to be answered 
numerically in the range from 1 to 5 (disagree, partially dis- 
agree, neutral, somewhat agree, fully agree), as well as a 
number of questions for which textual answers were solicited. 
In this paper, we present the numerical results as pairs: the 
mean values for the autonomy experts and for the software 
engineering experts). 

In the rest of this paper, we will present results from this sur- 
vey and discuss characteristics of autonomy software, issues 
in engineering and verification/validation of those systems. 
Finally, we present some techniques and advanced V&V tool 
that can help to mitigate the software risks inherent in auton- 
omy software. 
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2. Autonomy Software 

What is Autonomy Software? 

Subjectively, autonomy software is concerned with the auto- 
matic control of a system (e.g., UAV, spacecraft, robot, rover) 
without the need of human intervention or control. A more 
detailed look at the attributes, usually associated with auton- 
omy software [6] reveals a broad range from self-diagnosing 
to self-managing and self-adapting. The main point here is 
that the software actually contains some components for exe- 
cuting actions and making decisions. In principle, the auton- 
omy software must be able to reason about the environment 
and the system itself. Such a notion of autonomy software, 
or autonomic software, as coined by IBM 1 , is very broad and 
on a high level. Although a variety of work on the topic of 
autonomy exists, especially in the area of agent-based sys- 
tems (e.g., [3]) or autonomic computing, the authors tried to 
obtain a more practical definition of what autonomy software 
actually is. 

In the survey, a number of software engineering experts and 
experts in the area of autonomous systems had been asked to 
give an informal definition of autonomy software. The spec- 
trum of answers showed some strong commonalities, but sev- 
eral important aspects couldn’t be more disjoint. In order 
to illustrate the characteristics of autonomy software more 
clearly, let us discuss three examples of NASA autonomy 
software. 

The SAFM (Shuttle Abort Flight Management) system is a 
piece of software, which has been developed for the Shuttle 
update program [4], It is a Shuttle on-board system that ad- 
vises the Shuttle crew about launch abort options in case of 
an engine failure. In such a case, SAFM calculates the most 
appropriate profile of the abort flight path (e.g., which emer- 
gency landing strip to use as well as navigational aid) and dis- 
plays these data to the Shuttle pilot, who then will make the 
final decision and carry out the required operations. In the 
view of NASA managers, SAFM is an autonomous software 
system, since it operates without ground control. Obviously, 
SAFM is safety critical, although the software itself has no 
means to actually control the Shuttle at any time. 

The DART (Demonstration of Autonomous Rendezvous 
Technology) spacecraft 2 was intended as a prototype to 
demonstrate automatic (and un-guided) in space rendezvous 
and docking. Using its on-board sensors (image processing, 
radar, inertial navigation, and GPS), the autonomy software 
on-board the spacecraft controls the spacecraft to attempt a 
rendezvous with another satellite without human interven- 
tion. Due to problem(s) not yet determined, however, the ap- 
proach was automatically aborted before the target has been 
reached. Such an autonomous software system provides a 
higher level of autonomy, as the software can and has to con- 
trol the entire system for an extended period of time. Still, the 


1 www.research.ibm.com/autononiic 

2 www.nasa.gov/mission_pages/dart/main/ 


system has a fixed “goal” and the number of external (envi- 
ronmental) parameters is relatively small. 

An autonomous planning and scheduling system for a Mars 
rover (e.g., the PLEXIL [7] planning and execution system) 
has an even more complex task. Based upon initial high-level 
goals, the system has to automatically develop a plan on how 
to achieve this goal. This plan has to fulfill all constraints, be- 
fore it can be executed. During the execution of the plan, the 
state of the system or the environment might change, making 
it necessary for the autonomous system to re-plan the entire 
activity, and possibly even revise achievable goals. 

Athough the various kinds of autonomy software work off 
very different requirements, they share (at least) on charac- 
teristic item: the autonomy software is mission and safety 
critical, which means that failure of the autonomy software 
can lead to mission failure and could endanger human life. 
Therefore, verification and validation (V&V) of autonomy 
software is an extremely important issue. In the following 
sections, we will discuss results from our survey, regarding 
the major characteristics of autonomy software and the soft- 
ware engineering task to build such a system. 

Characteristics of Autonomy Software 

Despite all details on what comprises an autonomous sys- 
tem, there was a clear agreement that “the autonomous sys- 
tem must be able to execute a number of basic steps with- 
out human intervention in order to achieve a given goal” 
(4.14/4.25) 3 . ' 

As to major components of a software architecture for auton- 
omy software, answers were more disjoint. Asked if a typical 
autonomy system contains a planner, an executive (for execu- 
tion of the plan), and a state estimation component, autonomy 
experts tended to agree somewhat stronger (appr. 1 of the 5 
levels) than the software engineering experts. Asked if the 
structure and complexity of an AS is the same as for a com- 
parable traditional software system, we received a slightly 
negative, but inconclusive answer (2, 71/2.17). On the other 
hand, there was no clear indication that an autonomy soft- 
ware system should contain A-based or machine-learning al- 
gorithms (3.14/3.17), is agent based (3.00/2.17), is model- 
oriented/model-based (2.57/3.00), or has non-deterministic 
elements (3.29/3.00). 

3. Engineering an Autonomous System 

An autonomous software system is a complex, safety-critical 
piece of software of considerable size. Therefore, autonomy 
software (as any software) must be designed and engineered 
carefully. In a traditional sense, the software lifecycle phases 
of design, implementation, testing, and deployment are dis- 
tinguished. Typical software processes order these phases se- 


3 In this paper, with A/B we denote that the mean value of the answers 
(on a range from 1 (=disagree) to 5(=fully agree) given by autonomy experts 
was A, those given by software engineering experts was B. 
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quentially (e.g., waterfall model), or in spirals. A graphical 
representation of the phases of the software development — 
starting from the system requirements — is depicted in Fig- 
ure 1, Solid lines show the dependencies during the actual 
development, dashed lines how the verification tasks relate to 
previous stages in a backwards manner. Finally, validation 
(dotted lines) indicate, how the software is tested against the 
appropriate requirements and specifications. 
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Figure i. V-shaped software development and V&V 
process. Dependencies between the development phases are 
indicated by solid arrows. Dashed arrows concern 
verification activities; dotted lines validation activities. 


In our survey, we asked, how the special characteristics of 
autonomous software systems are and need to be reflected 
in the way, how such software is designed and implemented. 
Most software engineering experts expressed the opinion that 
an AS cannot be developed using the same software de- 
velopment process as traditional software (2.00). For au- 
tonomy experts, using the same process seemed to be a 
reasonable option (3.29). Likewise, there was less spe- 
cial expertise/experience required from the development team 
(2.86/3.50). The overall productivity of the software devel- 
opment (in lines of code per person-month) was rated roughly 
the same as for traditional software (2.43/3.33). 

Members of both areas expressed the specific and highly im- 
portant role of system requirements (4.00/3.67). These re- 
quirements must not only capture the underlying autonomy 
execution machinery, but must also express the characteris- 
tics and requirements of the model. Since model accuracy 
and consistency is important, such high rating was expected. 

Much less clear were the trends in the design and program- 
ming paradigm. The survey revealed that neither an object- 
oriented paradigm (2.71/2.50) nor model-based/model- 
oriented design principles (2.57/3.00) are favored in any 
sense. Also a distributed or multi-threaded implementation 
(which might be suitable for an agent-based design) seemed 
to be even less important (2.00/1.33). 


Whereas it seems that for the design and implementation of 
an autonomy software pretty much the same process can be 
used to achieve roughly the same productivity, things are very 
different for the verification and validation activities. It was 
a consensus that current best practices for V&V are not quite 
sufficient for autonomy software (2.14/1.83). Whereas the 
autonomy experts are fairly neutral regarding the suitability 
of a traditional V&V process, V&V experts call for a differ- 
ent or augmented V&V process (3.29/1.83). In their opinion, 
also the required effort of V&V for an autonomy system com- 
pared to the V&V of a traditional system (in order to achieve 
the same level of reliability) is somewhat higher (1.83 com- 
pared to 2.57 for autonomy experts). However, the range of 
answers was very wide, ranging from “almost none” and “not 
a lot” to “50% of our work” to “all of my time”. 

According to the survey, additional effort spent on V&V of 
autonomous software is mainly caused by three issues: 

1. the larger size and higher complexity of the valid input 
space, which can contain system status, environmental infor- 
mation, intended goals, and constraints, 4 

2. the complexity of the program logic required to derive the 
answer in the autonomous system (reasoning, planning, etc.), 
and 

3. the size and complexity of the domain model and descrip- 
tion of the environment. This domain model has to be vali- 
dated in addition to the execution software itself. 

4. Errors and Risks in Autonomous 
Systems 

Risks can come in from many areas/sources and can show up 
during the entire software and system lifecycle. Especially 
for a safety-critical system, like autonomy software, the risk 
can be considerable, leading to costly mission failures or po- 
tential loss of human fife (e.g., for the SAFM advisory sys- 
tem, or UAVs). With each potential risk, a probability is at- 
tached, indicating how likely such a failure event is. There 
is substantial work in the literature on this topic (e.g., [2] for 
software risk identification). In this survey and paper, we fo- 
cus on coding errors and risks. Errors which are introduced 
during implementation can pose a substantial risk for the en- 
tire software system, as many incidents show. In traditional 
safety-critical software, there is a fairly stable list of “usual 
culprits”, i.e., errors (or error classes), which tend to be par- 
ticularly harmful. Typical examples include buffer overrun 
errors, uninitialized variables, or synchronisation problems in 
distributed system. Table 1 shows a list of important coding 
error classes for safety-critical software (from [5]). For each 
error class, the importance to find such bugs (proportional to 
the risk), as well as the difficulty to locate such errors in the 
text is given. This list is the result of a survey; simular er- 
ror ratings were obtained in most of the application domains 
(NASA, aerospace industry). 

Obviously, such errors can show up in autonomy software. 

4 Often, histories of these values also form part of the valid input space. 
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A question, which we addressed in our survey was, how dif- 
ferent such a list would look like in the area of autonomy 
software. Results of the survey showed that the coding er- 
rors, found in autonomy software are not specific to auton- 
omy software; most of these errors could be found in tradi- 
tional software as well. 

On one hand, this result is reassuring in the sense, that no 
fundamentally different V&V tools must be designed. On 
the other hand, subtleties in autonomy-specific errors (e.g., 
modeling errors) are not yet fully understood. 
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1 

Use of un-initialized variables or constants 
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3 
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All variables explicitly declared 

5 
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Proper synchronization in multi-threaded execution 
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evaluation used as required to achieve desired results 

5 

5 

Resource contention 

5 

2 

Exception handling 

5 

5 

The design implemented completely and correctly 

4 

2 

No missing or extraneous functions. 

5 

1 

Error messages and return codes used 

5 

1 

Good code comments 


Table 1. Sample Questionnaire for traditional safety-critical 

code 


5. Risk Mitigation 

In order to mitigate the risks of using autonomy one can im- 
prove the verification and validation process for such systems. 
From our survey it emerged that the surveyed autonomy sys- 
tem developers didn’t use any special/custom tools for V&V, 
nor did they consider the current best practices for software 
V&V adequate to ensure reliable autonomous systems will 
be developed. Current best practice for software V&V is test- 
ing, hence we here consider here three additional approaches 
to mitigate risk: static analysis, model checking and runtime 
analysis. For each of these techniques we relate experimen- 
tal results to show what the expected risk mitigation for each 
technique and error-class combination is. For brevity we only 
consider a few relevant software error-classes, that are not au- 
tonomy specific. Note that our survey results indicate that 
errors in autonomy software are similar to errors commonly 
found in any software system - the complexity of V&V is 
therefore not in the type of errors, but rather the complexity 
of the software and the environment it operates in. 

Methods and Approaches 

We consider the following three advanced V&V techniques 
that can augment testing for autonomy V&V: 


Static analysis This approach allows a program to be ana- 
lyzed without having to execute it, and typically checks for 
the potential of run-time errors such as null pointer deref- 
erences, array out of bounds accesses, divide-by-zero and 
uninitialized variable usage. The strength of this technique is 
that it evaluates the program operations for all possible execu- 
tion environments, and the weak n ess is that it might produce 
false warnings. 

Model Checking This technique allows the analysis of all 
possible program behaviors for essentially any behavioral 
property violation - although it performs best on properties 
such as deadlocks and race violations that traditional testing 
cannot easily detect. Its major weakness is that it doesn’t 
scale to large programs. 

Run-time analysis This is an advanced form of testing where 
the execution of the program is monitored to check for some 
common errors (such as potential for deadlocks, data races, 
etc.) as well as functional properties of the program under 
test. This approach requires program instrumentation, that 
can often be achieved through aspect-oriented programming. 

Tools and Techniques for V&V 

For static analysis we considered two commercial tools, 
namely PolySpace 5 and Coverity 6 . Poly Space does a data- 
flow analysis based on abstract interpretation and never 
misses an error, whereas Coverity does a path sensitive analy- 
sis and might miss errors. However, Coverity rarely produces 
false warnings, whereas PolySpace produces large numbers 
of false warnings that the user need to evaluate to determine 
whether they are real errors or not. They also behave quite 
differently in setup and running times: Coverity is easy to 
configure and runs in a matter of minutes and PolySpace is 
very complicated to get running and typically runs for days on 
programs of the order of 30, 000 lines of code. Both tools ana- 
lyze C and C++ programs and they were evaluated on NASA 
flight code: one unmanned autonomous mission and a rele- 
vant portion of SAFM code. 

For model checking we used the Java PathFinder (JPF) 7 
model checker for Java code. It is an explicit-state model 
checker that can handle programs up to 10, 000 lines of code. 
Run-time analysis was done with the commercial Temporal 
Rover tool as well as the special purpose Java tool called 
JPaX. These tools were all evaluated on a Java version of code 
for a prototype Mars Rover and the results of the experiments 
conducted were first reported in [1], 

As for the classes of errors that we consider here, we focus 
on the errors that was present (or is a concern) in the three 
systems we mention above (unmanned and manned NASA 
flight software and the Rover code). These are typical errors 
that one would anticipate in code and since the three systems 
are all autonomy related therefor also in autonomy code. 


5 vrww . polyspace . com 

6 www . coverity . com 

7 j avapathf indet . sourcef orge .net 



After analyzing the three systems with the given techniques 
we formed a qualitative view of the risk mitigation obtained 
by each tool - the results are shown in Table 2. Note that al- 
though the experiments reported in [1] produced quantitative 
results, the analyzes done here were not done in a controlled 
fashion and hence the results are not as precise. 

Only in two cases did our experiments validate instances 
where we believed our tools will perform well and it ac- 
tually did perform well: Coverity on finding uninitialized 
variables and model checking with JPF for detecting dead- 
locks/dataraces. The worst cases are when we believed the 
tools can perform a good job and then they are either not ap- 
plicable at all or they perform very badly. For example, we 
believed Coverity can detect divide-by-zero errors, but in fact 
it cannot and similarly, although PolySpace can detect these 
it flags too many false warnings for the results to be useful. 
Model checking can in theory be very good at finding some 
of the error classes, but our current set of experiments were 
not capable of determining its strengths. 

Note that runtime analysis, which is just an advanced form of 
traditional testing, performs the best overall - this is mostly 
because some of the more behavioral types of errors (such as 
faults in error handling code) can only reliably be detected 
by running the programs. This is somewhat worrying for risk 
mitigation for autonomy in general, since many of the sub- 
tle errors in autonomy is more likely to be in aspects of the 
code where static analysis for example is not likely to per- 
form well - finding errors in models for example. Clearly 
more research will be required to develop new V&V methods 
to mitigate some of these risks. Model checking for exam- 
ple has been shown to be valuable to analyze models used 
in vehicle health management and testing where the tests are 
designed to give coverage of the models seem like a useful 
avenue for investigation. 

In summary it seems that the current state-of-the-art in V&V 
tools can find errors that are present in autonomy software, 
but not as reliably well as one would have hoped. In addition, 
tools for detecting behavioral errors, which one can easily 
argue will be the most complex to find in autonomy systems, 
are not as developed as the ones for finding (simple) runtime 
errors. 

6. Conclusions 

In this paper, we have presented results on a survey about 
autonomy software, its characteristics and V&V issues that 
was carried out at NASA in summer 2005. We had asked 
software engineering experts and experts in autonomy soft- 
ware. Although most of the projects originated from a NASA 
background (Shuttle Autonomy, Rovers, Robotics, etc.), the 
main characteristics of a safety- or mission-critical autonomy 
software and associated verification and validation challenges 
seem to be the same across the board, indicating that results 
of our survey can be carried over to other application domains 
like UAV. 


ADD: summary of major findings; tool error classes and mit- 
igation. 
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Table 2. Capabilities of V&V tools to detect errors for important error classes. The left column indicates the expected 
usefulness of the tool, the right column the usefulness as obtained in our experiments (l=very good, 2=useful, 3=probably not 

useful, X=cannot be used, ?=requires more research and experiments) 


Dr. Willem Visser Dr. Visser received 
his Ph.D. from the University of Manch- 
ester in 1998. After completion of his 
Ph.D. studies in October 1998 he started 
work at the Research Institute for Ad- 
vanced Computer Science (RIACS) at 
NASA Ames. His main research focus is 
on the application of model checking to 
programming languages. He is one the main developer of 
the Java PathFinder model checker for Java — that won the 
2003 TGIR Engineering Innovation award from the Office 
of Aerospace Technology at NASA. His current research fo- 
cuses on using symbolic execution and model checking for 
test-case generation and program proofs, environment gener- 
ation, feasible counter-example detection during abstraction- 
based model checking, belief-logics and agent verification. 



6 



























